AITopics | intrinsic confidence

Collaborating Authors

intrinsic confidence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?

Yona, Gal, Aharoni, Roee, Geva, Mor

arXiv.org Artificial IntelligenceMay-27-2024

We posit that large language models (LLMs) should be capable of expressing their intrinsic uncertainty in natural language. For example, if the LLM is equally likely to output two contradicting answers to the same question, then its generated response should reflect this uncertainty by hedging its answer (e.g., "I'm not sure, but I think..."). We formalize faithful response uncertainty based on the gap between the model's intrinsic confidence in the assertions it makes and the decisiveness by which they are conveyed. This example-level metric reliably indicates whether the model reflects its uncertainty, as it penalizes both excessive and insufficient hedging. We evaluate a variety of aligned LLMs at faithfully communicating uncertainty on several knowledge-intensive question answering tasks. Our results provide strong evidence that modern LLMs are poor at faithfully conveying their uncertainty, and that better alignment is necessary to improve their trustworthiness.

assertion, decisiveness, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2405.16908

Country:

North America > United States > Alabama (0.04)
North America > Mexico (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media (1.00)
Government > Regional Government > North America Government > United States Government (0.69)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Methods to Estimate Large Language Model Confidence

Kotelanski, Maia, Gallo, Robert, Nayak, Ashwin, Savage, Thomas

arXiv.org Artificial IntelligenceDec-8-2023

Large Language Models have difficulty communicating uncertainty, which is a significant obstacle to applying LLMs to complex medical tasks. This study evaluates methods to measure LLM confidence when suggesting a diagnosis for challenging clinical vignettes. GPT4 was asked a series of challenging case questions using Chain of Thought and Self Consistency prompting. Multiple methods were investigated to assess model confidence and evaluated on their ability to predict the models observed accuracy. The methods evaluated were Intrinsic Confidence, SC Agreement Frequency and CoT Response Length. SC Agreement Frequency correlated with observed accuracy, yielding a higher Area under the Receiver Operating Characteristic Curve compared to Intrinsic Confidence and CoT Length analysis. SC agreement is the most useful proxy for model confidence, especially for medical diagnosis. Model Intrinsic Confidence and CoT Response Length exhibit a weaker ability to differentiate between correct and incorrect answers, preventing them from being reliable and interpretable markers for model confidence. We conclude GPT4 has a limited ability to assess its own diagnostic accuracy. SC Agreement Frequency is the most useful method to measure GPT4 confidence.

agreement frequency, diagnosis, intrinsic confidence, (14 more...)

arXiv.org Artificial Intelligence

2312.03733

Country:

North America > United States > Massachusetts (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Providers & Services (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)

Add feedback